Method and system for on-line blind source separation

ABSTRACT

A method and apparatus is disclosed for performing blind source separation using convolutive signal decorrelation. For a first embodiment, the method accumulates a length of input signal (mixed signal) that includes a plurality of independent signals from independent signal sources. The invention then divides the length of input signal into a plurality of T-length periods (windows) and performs a discrete Fourier transform (DFT) on the, signal within each T-length period. Thereafter, estimated cross-correlation values are computed using a plurality of the averaged DFT values. A total number of K cross-correlation values are computed, where each of the K values is averaged over N of the T-length periods. Using the cross-correlation values, a gradient descent process computes the coefficients of a finite impulse response (FIR) filter that will effectively separate the source signals within the input signal. A second embodiment of the invention is directed to on-line processing of the input signal—i.e., processing the signal as soon as it arrives with no storage of the signal data. In particular, an on-line gradient algorithm is provided for application to non-stationary signals and having an adaptive step size in the frequency domain based on second derivatives of the cost function. The on-line separation methodology of this embodiment is characterized as multiple adaptive decorrelation.

This application is a continuation in part of application Ser. No.09/191,217, now U.S. Pat. No. 6,167,417 entitled “Convolutive BlindSource Separation Using A Multiple Decorrelation Method” filed Nov. 12,1998.

This application claims the benefit of U.S. Provisional Application No.60/151,838 filed Sep. 1, 1999, which is herein incorporated byreference.

FIELD OF THE INVENTION

The invention relates to signal processing and, more particularly to amethod and apparatus for performing on-line blind source signalseparation.

BACKGROUND OF THE INVENTION

In the art of speech processing, it is necessary in some circumstancesto separate a mixture of differently convolved and mixed signals—wherethose signals typically emanate from multiple sources—without a prioriknowledge of those signals. Such a separation of a composite signal intoits constituent component signals is known as blind source separation(BSS), and various BSS techniques are known in the art. These techniquesare useful for separating source signals that are simultaneouslyproduced by independent sources—e.g., multiple speakers, sonar arrays,and the like, and which signals are combined in a convolutive medium.BSS techniques may be applied in such applications as speech detectionusing multiple microphones, crosstalk removal in multichannelcommunications, multipath channel identification and equalization,direction of arrival (DOA) estimation in sensor arrays, improvement ofbeam forming microphones for audio and passive sonar, and discovery ofindependent source signals in various biological signals, such as EEG,MEG and the like.

Most of the known BSS algorithms try to invert a multi-path acousticenvironment by finding a multi-path finite impulse response (FIR) filterthat approximately inverts the forward channel. However, a perfectinversion may require fairly long FIR filters—such situationsparticularly occurring in strongly echoic and reverberant rooms wheremost, if not all, current algorithms fail. Additionally, changingforward channels due to moving sources, moving sensors, or changingenvironments require an algorithm that converges sufficiently quickly tomaintain an accurate current inverse of the channel.

Two general types of BSS algorithms are known in the art for blindsource separation of a convolutive mixture of broad-band signals: (1)algorithms that diagonalize a single estimate of the second orderstatistics, and (2) algorithms that identify statistically independentsignals by considering higher order statistics. Algorithms of the firsttype generate decorrelated signals by diagonalizing second orderstatistics and have a simple structure that can be implementedefficiently. [See, e.g., E. Weinstein, M. Feder, and A. V. Oppenheim,“Multi-Channel Signal Separation by Decorrelation”, IEEE Trans. SpeechAudio Processing, vol. 1, no. 4, pp. 405-413, April 1993; S. Van Gervenand D. Van Compernolle, “Signal Separation by Symmetric AdaptiveDecorrelation: Stability, Convergence, and Uniqueness”, IEEE Trans.Signal Processing, vol. 43, no. 7, pp. 1602-1612, July 1995; K.-C. Yenand Y. Zhao, “Improvements on co-channel separation using ADF: Lowcomplexity, fast convergence, and generalization”, in Proc. ICASSP 98,Seattle, Wash., 1998, pp. 1025-1028; M. Kawamoto, “A method of blindseparation for convolved non-stationary signals”, Neurocomputing, vol.22, no. 1-3, pp. 157-171, 1998; S. Van Gerven and D. Van Compernolle,“Signal separation in a symmetric adaptive noise canceler by outputdecorrelation”, in Proc. ICASSP 92, 1992, vol. IV, pp. 221-224.]However, they are not guaranteed to converge to the right solution, assingle decorrelation is not a sufficient condition to obtain independentmodel sources. Instead, for stationary signals higher order statisticshave to be considered, either by direct measurement and optimization ofhigher order statistics [See, e.g., D. Yellin and E. Weinstein,“Multichannel Signal Separation: Methods and Analysis”, IEEE Trans.Signal Processing, vol. 44, no. 1, pp. 106-118, 1996; AL-L. N. Thi andC. Jutten, “Blind source separation for convolutive mixtures”, SignalProcessing, vol. 45, no. 2, pp. 209-229, 1995; S. Sharnsunder and G.Giannakis, “Multichannel Blind Signal Separation and Reconstruction”,IEEE Trans. Speech Audio Processing, vol. 5, no. 6, pp. 515-528, Nov.997], or indirectly by making assumptions on the shape of the cumulativedensity function (cdf) of the signals [See, e.g., R. Lambert and A.Bell, “Blind Separation of Multiple Speakers in a MultipathEnvironment”, in Proc. ICASSP 97, 1997, pp. 423-426; S. Amari, S. C.Douglas, A. Cichocki, and A. A. Yang, “Multichannel blind deconvolutionusing the natural gradient”, in Proc. 1st IEEE Workshop on SignalProcessing App. Wireless Comm, 1997, pp. 101-104; T. Lee, A. Bell, andR. Lambert, “Blind separation of delayed and convolved sources”, inProc. Neural Information Processing Systems 96, 1997]. The formermethods are fairly complex and difficult to implement. The lattermethods fail in cases where the assumptions on the cdf are not accurate.

A limited body of on-line BSS algorithms is also known, generally forthe case of single decorrelation and for indirect higher order methods,and having the same limitations as their off-line counterparts. [See,e.g., E. Weinstein, M. Feder, and A. V. Oppenheim, “Multi-Channel SignalSeparation by Decorrelation”, IEEE Trans. Speech Audio Processing, vol.1, no. 4, pp. 405413, April 1993; S. Van Gerven and D. Van Compernolle,“Signal Separation by Symmetric Adaptive Decorrelation: Stability,Convergence, and Uniqueness”, IEEE Trans. Signal Processing, vol. 43,no. 7, pp. 1602-1612, July 1995; K.-C. Yen and Y. Zhao, “Improvements onco-channel separation using ADF: Low complexity, fast convergence, andgeneralization”, in Proc. ICASSP 98, Seattle, Wash., 1998, pp.1025-1028; K-C Yen and Y. Zhao, “Adaptive Co-Channel Speech Separationand Recognition”, IEEE Trans. Signal Processing, vol. 7, no. 2, March1999; S. Amari, C. S. Douglas, A. Cichocki, and H. H. Yang, “NovelOn-Line Adaptive Learning Algorithms for Blind Deconvolution Using theNatural Gradient. Approach”, in Proc. 11th IFAC Symposium on SystemIdentification, Kitakyushu City, Japan, July 1997, vol. 3, pp.1057-1062; P. Smaragdis, “Blind separation of convolved mixtures in thefrequency domain”, Neurocomputing, vol. 22, pp. 21-34, 1998.] Singledecorrelation algorithms have also been described that purport tooperate on non-stationary signals [See, e.g., M. Kawamoto, “A method ofblind separation for convolved non-stationary signals”, Neurocomputing,vol. 22, no. 1-3, pp. 157-171, 1998; T. Ngo and N. Bhadkamkar, “Adaptiveblind separation of audio sources by a physically compact device usingsecond-order statistics”, in ICA '99, Loubaton Cardoso, Jutten, Ed.,1999, pp. 257-260; H. Sahlin and H. Broman, “Separation of real-worldsignals”, Signal Processing, vol. 64, pp. 103-104, 1998]. However, theart is not believed to provide an on-line BSS method which yields fastconvergence for non-static filters—an essential criteria, since the datamay be visited only once.

Therefore, there is a need for a blind source separation technique thataccurately and quickly performs convolutive signal decorrelation.

SUMMARY OF THE INVENTION

The disadvantages of the prior art are overcome by a method andapparatus that performs blind source separation using convolutive signaldecorrelation by simultaneously diagonalizing second order statistics atmultiple time periods in the frequency domain. More specifically, in afirst embodiment, the invention accumulates a length (segment) of inputsignal that comprises a mixture of independent signal sources. Theinvention then divides the length of input signal into a plurality ofT-length periods (windows) and performs a discrete Fourier transform(DFT) on the mixed signal over each T-length period. Thereafter, theinvention computes K cross-correlation power spectra that are eachaveraged over N of the T-length periods. Using the cross-correlationpower values, a gradient descent process computes the coefficients of anFIR filter that will effectively separate the source signals within theinput signal by simultaneously decorrelating the K cross-correlationpower spectra.

To achieve an accurate solution, the gradient descent process isconstrained in that the time-domain values of the filter coefficientscan attain only certain values—i.e., the time-domain filter coefficientvalues W(τ) are constrained within the T-length period to be zero forany time τ>Q<<T. In this manner, the so-called “permutation problem” issolved and a unique solution for the FIR filter coefficients is computedsuch that a filter produced using these coefficients will effectivelyseparate the source signals.

For circumstances where it is not practical to accumulate a length ofinput signal for processing according to the method of the firstembodiment, a second embodiment of the invention is directed to on-lineprocessing of the input signal—i.e., processing the signal as soon as itarrives with no storage of the signal data. In particular, an on-linegradient algorithm is provided for application to non-stationary signalsand having an adaptive step size in the frequency domain based on secondderivatives of the cost function. The on-line separation methodology ofthis embodiment is characterized as multiple adaptive decorrelation(MAD).

Generally, either embodiment of the invention may be implemented as asoftware routine that is stored in a storage medium and executed on ageneral purpose computer system. However, a hardware implementation isreadily apparent from the following detailed description.

The present invention finds application in a voice recognition system asa signal preprocessor system for decorrelating signals from differentsources such that a voice recognition processor can utilize the variousvoice signals without other interfering noise sources. In response tothe voice signal, the voice recognition processor can then producecomputer commands or computer text.

BRIEF DESCRIPTION OF THE DRAWINGS

The teachings of the present invention can be readily understood byconsidering the following detailed description in conjunction with theaccompanying drawings, in which:

FIG. 1 depicts a flow diagram of an embodiment of the present invention;

FIG. 2 depicts a flow diagram of another embodiment of the presentinvention;

FIG. 3 depicts a schematic frequency domain graph of filter coefficientsgenerated by an embodiment of the present invention;

FIG. 4 depicts a schematic time domain graph of the filter coefficientsgenerated by an embodiment of the present invention; and

FIG. 5 depicts a system for executing a software implementation of thepresent invention.

DETAILED DESCRIPTION OF INVENTION

An important aspect of the methodology of the invention is therecognition that, with non-stationary signals, the non-stationarity canbe exploited to decorrelate the estimated sources in a BSS environmentat multiple times—i.e. at such multiple times, one will have differentcorrelations. With the computation of the cross correlations among themultiple non-stationary signals, a different cross correlation measurewill be determined at a first time than at a second, later time. In thisnon-stationary, multiple source environment, a separation algorithmaccording to the method of the invention, operates on thetemporally-separated multiple cross correlations to simultaneouslydiagonalize them. Two embodiments of the method of the invention aredescribed below. In a first embodiment, all cross correlations are firstcomputed from the entire signal. A gradient descent algorithm thencomputes the best diagonalizing filters, with which the signals aresubsequently filtered. In the second embodiment, a single runningestimate of the changing cross correlation is maintained. An on-linegradient descent algorithm updates filters to diagonalize the currentcross correlation. The updated filters are immediately applied to theincoming signal.

The basic source separation problem is simply described by assuming d;non-stationary decorrelated source signals s(t)=[s₁(t),,s_(d) _(t) (t)]⁷that have been convolved and mixed in a linear medium. These signals areobserved in a multi-path environment A(τ) of order P as x(t)=[x₁(t), . .. , x_(d) _(x) (t)]^(T) (d_(x) representing the number of sensorsignals). It is further assumed that d_(s)≦d_(x)—i.e., at least as manysensors as sources. The convolved, noisy signal is represented in thetime domain by the following equation (known as a forward model):$\begin{matrix}{{x(t)} = {{\sum\limits_{\tau = 0}^{P}\quad{{A(\tau)}{s\left( {t - \tau} \right)}}} + {n(t)}}} & (1)\end{matrix}$

-   -   where n(t) represents additional sensor noise.        Source separation techniques are used to identify the        d_(x)d_(s)P coefficients of the channels A and to ultimately        determine an estimate ŝ(t) for the unknown source signals.

Alternatively, under certain conditions on the coefficients of A(τ), themulti-path channel is invertible with a finite impulse response (FIR)multi-path filter W(τ) of suitably chosen length Q. That inverse model(known as the backward model) is represented by the following equation:$\begin{matrix}{\hat{s} = {{u(t)} = {\sum\limits_{\tau = 0}^{Q}\quad{{W(\tau)}{x\left( {t - \tau} \right)}}}}} & (2)\end{matrix}$The method of the invention estimates values of parameters W for thebackward model of equation 2 by assuming non-stationary source signalsand using a least squares (LS) optimization to estimate W, as well assignal and noise powers.I. Description Of An Off-Line Multiple Decorrelation Embodiment

An embodiment of the invention directed to off-line source separationusing multiple decorrelation is described hereafter in conjunction witha flow chart for that embodiment depicted in FIG. 1. With reference tothe flow chart, the convolutive (mixed) signal is input at step 100,and, at step 101, the signal is parsed into a plurality of windowscontaining T-samples of the input signal X(t). The routine produces adiscrete Fourier transform (DFT) value for each window χ(t)—ie., one DFTvalue for each window of length T samples.

At step 102, the method of the invention uses the DFT values toaccumulate K cross-correlation power spectra, where each of the Kspectra are averaged over N windows of length T samples.

For non-stationary signals, the cross correlation estimates will bedependent on the absolute time and will indeed vary from one estimationsegment (an NT period) to the next. The cross correlation estimatescomputed in step 104 are represented as: $\begin{matrix}{{{\hat{R}}_{x}\left( {t,v} \right)} = {\frac{1}{N}{\sum\limits_{n = 0}^{N - 1}\quad{{\chi\left( {{t + {nT}},v} \right)}{\chi^{T}\left( {{t + {nT}},v} \right)}}}}} & (3)\end{matrix}$where:χ(t+nT, v) FFT X(t+nT)X(t)=[x(t)−x(t+T−1)]and χ(v) is the FFT of the input signal within a window containing Tsamples. As such, the routine, at step 104, computes a matrix for eachtime t and for each frequency v and then sums all the matrix componentswith each other matrix component. Steps 106, 108, 110 and 112 iteratethe correlation estimation of step 104 over n=0 to N and k=0 to K toproduce the K spectra.

Equation 3 can then be simplified to a matrix representation:{circumflex over (R)}_(x)(t, v)=A(v)Λ_(s)(t, v)A^(H)(v)+Λ_(n)(t, v)  (4)If N is sufficiently large, Λ_(s)(t, v) and Λ_(n)(t, v) can be modeledas diagonal matrices due to the signal independence assumption. ForEquation 4 to be linearly independent for different times, it will benecessary that Λ_(s)(t, v) changes over time—i.e., the signals arenon-stationary.

Using the cross correlation estimates of equation 4, the inventioncomputes the source signals using cross-power-spectra satisfying thefollowing equation:Λ_(s)(t, v)=W(v)({circumflex over (R)}_(x)(t, v)−Λ_(n)(t,v))W^(H)(v)  (5)

In order to obtain independent conditions for every time period, thetime periods are generally chosen to have non-overlapping estimationtimes for {circumflex over (R)}_(x)(t_(k), v)—i.e., t_(k)=kTN. But ifthe signals vary sufficiently fast, overlapping estimation times may beutilized. Furthermore, although the windows T are generally sequential,the windows may overlap one another such that each DFT value is derivedfrom signal information that is also contained in the previous window.In an audio signal processing system, the specific value of T isselected based upon room acoustics of the room in which the signals arerecorded. For example, a large room with many reverberation pathsrequires a long window T such that the invention can process asubstantial amount of signal information to achieve source signalseparation. The value of N is generally determined by the amount ofavailable data for processing. Typical values are N=20, T=1024 samplesand K=5.

The inventive method computes a multipath channel W (i.e., the tapvalues of a multidimensional FIR filter) that simultaneously satisfiesequation 5 for K estimation periods, e.g., 2 to 5 estimation periods forprocessing audio signals. Such a process is performed at steps 114, 116,118 (collectively a filter parameter estimation process 124) and isrepresented using a least squares estimation procedure as follows:$\begin{matrix}{\hat{W},{\hat{\Lambda}}_{s},{{\hat{\Lambda}}_{n} = {\underset{{{{W{(\tau)}} = 0},{\tau > Q}}{{W_{ii}{(v)}} = 1}}{\underset{W,\Lambda_{s},\Lambda_{n}}{argmin}}{\sum\limits_{v = 1}^{T}\quad{\sum\limits_{k = 1}^{K}\quad{{E\left( {k,v} \right)}}^{2}}}}}} & (6)\end{matrix}$

-   -   where        E(k, v)=W(v)({circumflex over (R)}_(x)(k, v))−Λ_(n)(k,        v))W^(H)(v)−Λ_(s)(k, v)        For simplicity, a short form nomenclature has been used in        equation 6, where Λ_(s)(k, v)=Λ_(s)(t_(k), v) and        Λ_(s)=Λ_(s)(t₁, v), , Λ_(s)(t_(K), v) and the same simplified        notation also applies to Λ_(n)(t, v) and R_(x)(t, v).

To produce the parameters W, a gradient descent process 124 (containingsteps 114, 116, 118, and 120) is used that iterates the values of Wascost function (6) is minimized. In step 116, the W values are updated asW^(new)=W^(old)−μ∇_(w)E, where ∇_(w)E is the gradient step value and μis a weighting constant that controls the size of the update. Morespecifically, the gradient descent process determines the gradientvalues as follows. $\begin{matrix}{\frac{\partial E}{\partial{W^{*}(v)}} = {\sum\limits_{k = 1}^{K}{{E\left( {k,v} \right)}{W(v)}{B^{H}\left( {k,v} \right)}}}} & (7) \\{\frac{\partial E}{\partial{{\hat{\Lambda}}_{s}^{*}(k)}} = {- {{diag}\left( {E\left( {k,v} \right)} \right)}}} & (8) \\{\frac{\partial E}{\partial{{\hat{\Lambda}}_{n}^{*}(k)}} = {- {{diag}\left( {{W^{H}(v)}{E\left( {k,v} \right)}{W(v)}} \right)}}} & (9)\end{matrix}$  B(k, v)≡{circumflex over (R)}_(x)(k, v)−Λ_(n)(k, v)  (10)

With equation 8 equal to zero, the routine can solve explicitly forparameters Λ_(s)(k, v), while parameters Λ_(n)(k, v) and W(v) arecomputed with a gradient descent rule, e.g., new values of Λ_(s)(k, v)and W(v) are computed with each pass through the routine until the newvalues of W(v) are not very different from the old values of W(v)—i.e.,W is converged.

Note that equation 6 contains an additional constraint on the filtersize in the time domain. Up to that constraint it would seem the variousfrequencies v=1, . . . , T represent independent problems. The solutionsW(v) however are restricted to those filters that have no time responsebeyond r>Q<<T. Effectively, the routine parameterizes Td_(s)d_(x) filtercoefficients in W(v) with Qd_(s)d_(x) parameters W(τ). In practice, thevalues of W are produced in the frequency domain, at step 114, e.g.,W(v), then, at step 118, an FFT is performed on these frequency domainvalues to convert the values of W(v) to the time domain, e.g., W(r). Inthe time domain, any W value that appears at a time greater than a timeQ is set to zero and all values in the range below Q are not adjusted.The adjusted time domain values of are then converted using an inverseFFT back to the frequency domain. By zeroing the filter response in thetime domain for all time greater than Q, the frequency response of thefilter is smoothed such that a unique solution at each frequency isreadily determined.

FIG. 3 depicts an illustration of two frequency responses 302A and 302Band FIG. 4 depicts an illustration of their corresponding time-domainresponses 304A and 304B. The least squares solutions for thecoefficients are found using a gradient descent process performed atstep 124 such that an iterative approach is used to determine thecorrect values of W. Once the gradient in equation 7 becomes “flat” asidentified in step 120, the routine, at step 122, applies the computedfilter coefficients to an FIR filter. The FIR filter is used to filterthe samples of the input (mixed) signal x(t) in the time period KNT inlength. The FIR filter generates, at step 126, the decorrelatedcomponent signals of the mixed signal. Then, the routine at step 128gets the next KNT number of samples for processing and proceeds to step100 to filter the next KNT samples. The previous KNT samples are removedfrom memory.

II. Description Of An On-Line Multiple Decorrelation Embodiment

With the off-line multiple decorrelation method of the prior embodiment,an entire signal segment (of at least several seconds) is divided intodifferent portions—K estimation periods—with the cross correlations forportions being computed. Then the separation algorithm of thatembodiment operates to simultaneously make all decorrelations diagonal.The essential point, here, is that, with that embodiment, it isnecessary to have the entire signal segment available to thealgorithm—otherwise the measurement of the cross correlations atdifferent times would not be possible.

It should readily be understood that this can be a significantlimitation in some applications. For many BSS applications involvingspeech signals, a file representing even several seconds of multiplechannels of speech data would be quite large—on the order of megabytes.Storage of data files of this size, particularly in high-speed memory,will often be impractical. Accordingly, a method is provided herein foron-line processing of convolved signal data from multiple sources—i.e.,processing such data as soon as it arrives, with no storage of the data.The on-line multiple decorrelation methodology described hereafterembodies the advantages of using temporally-separated multipledecorrelations, while avoiding the necessity for storage of the inputdata.

A. On-line Multiple Decorrelation Methodology

Like the decorrelation algorithm of the prior embodiment, the algorithmof this embodiment is a gradient descent algorithm that operates tooptimize a particular criteria —that criteria being the decorrelation ofthe cross correlations across time. [It should be understood, however,that the gradient descent methodology for minimizing a cost function issimply a preferred embodiment of the invention. Other knowncost-function optimization methods are equally contemplated within thescope of the invention.] In the operation of the methodology of theinvention, the gradient descent algorithm operates to minimize a costfunction defined in terms of the separation filters, and to therebyestablish the parameters for the filters.

The decorrelation algorithm of the invention operates to provide arunning estimate of the cross-correlation matrix—i.e., upon the arrivalof a new set of data, the cross-correlation estimates for the prior dataset is updated to become the current cross-correlation estimate for thesignal data. That running estimate can be used to compute a stochasticgradient of the filter parameter optimization criteria—stochasticgradient being intended to connote a gradient descent algorithm whichtakes a gradient step for every t instead of summing the total gradientbefore updating.

Development of the decorrelation algorithm of the invention is shown inthe next section. Prior to that discussion, however, the application ofthat algorithm according to the method of the invention will befunctionally described in conjunction with a flow chart for thisembodiment of the invention, as shown in FIG. 2.

In the operation of the methodology of the invention, a plurality ofinput signals are divided into sequences of windowed segments. Thewindowed segments are input to the invention at step 200 of the flowchart. As each set of windowed segments is input, the signal segments inthat window are transformed to the frequency domain using a DiscreteFourier Transform (DFT) at step 201. The DFT transforms of the windowedsignal components are then used, at step 202, to compute the estimatedcross-correlation matrices—ie., for every frequency component in thesignal segment, a cross-correlation matrix is computed. Using a gradientdescent process, gradients corresponding to each of the crosscorrelation matrices are then determined at step 203. These gradientsare then used to compute filter parameter updates, at step 204,according to the decorrelation algorithm of the invention. Note that theBSS environment will include multiple signal inputs and multiple signaloutputs, and the decorrelation algorithm of the invention provides adistinct filter between every input and every output. Accordingly, setsof W filter coefficients are developed for a matrix of filters by thedecorrelation algorithm.

In practice, the filter parameter values, W, may include unwantedcoefficients that actually reduce the efficacy of the filter inseparating out the desired signal. Thus, in a preferred embodiment,those unwanted coefficients are eliminated, at step 205, by a temporaryconversion of the frequency domain W values (using inverse DFT) back tothe time domain. There, any W value that appears at a time greater thana time Q (equal to the filter length) is set to zero and all values inthe range below Q are left unchanged. The constrained time domain valuesof are then converted back to the frequency domain, using DFT. Byzeroing the filter response in the time domain for all time greater thanQ, the frequency response of the filter is smoothed such that a uniquesolution at each frequency is readily determined.

In forming the running estimate of the filter parameter updates, thecost function is progressively minimized, using the cross-correlationmeasures for each successive windowed segment input to the costfunction, to find the filter coefficients for the current segment. Thatcurrent filter is applied to the input signal at step 207 to provide adecorrelation of the desired signal. Concurrently, the process of theinvention returns, at step 208, to the input step, with the inputting ofthe next windowed signal segment.

B. Derivation of On-line Multiple Decorrelation Algorithm

-   -   (1) Basic Algorithm

As discussed hereinabove, non-stationary source signals can be recoveredby optimizing filter coefficients W such that the estimated sources ŝ(t)have diagonal second order statistics at different times,∀t, τ: E[ŝ(t)ŝ^(H)(t−τ)]={circumflex over (Λ)}(t, τ).  (13)Here {circumflex over (Λ)}, (t, τ)=diag ([{circumflex over (λ)}₁(t, τ),. . . , {circumflex over (λ)}_(d) _(t) (t, r)]) represent theautocorrelations of the source signals at times t which have to beestimated from the data. This criteria determines the estimate i(t) atbest up to a permuted and convolved version of the original sourcess(t). In the off-line multiple decorrelation embodiment, heretoforedescribed, second-order statistics at multiple times were estimated andsimultaneously diagonalized in the frequency domain. For the on-linecase of the present embodiment, the cost function needed to develop thefilter parameters is initially developed in the time domain directly,and the algorithm thereafter transferred into the frequency domain toget a more efficient and faster-converging on-line update rule. Thesample average starting at time t is used for the expectation—i.e.,E[f(t)]=Σ_(r)′f(t+τ¹). A separation criteria can then be defined forsimultaneous diagonalization with: $\begin{matrix}{{J(W)} = {{\sum\limits_{t}\quad{J\left( {t,W} \right)}} = {\sum\limits_{t}{{J\left( {t,W} \right)}}^{2}}}} & (14) \\{\quad{= {\sum\limits_{t,\tau}{{{E\left\lbrack {{\hat{s}(t)}{{\hat{s}}^{H}\left( {t - \tau} \right)}} \right\rbrack} - {{\hat{\Lambda}}_{s}\left( {t,\tau} \right)}}}^{2}}}} & (15) \\{\quad{= {\sum\limits_{t,\tau}{{{\sum\limits_{\tau^{\prime}}{{\hat{s}\left( {t + \tau^{\prime}} \right)}{{\hat{s}}^{H}\left( {t - \tau + \tau^{\prime}} \right)}}} - {{\hat{\Lambda}}_{s}\left( {t,\tau} \right)}}}^{2}}}} & (16)\end{matrix}$where the Frobenius norm given by ∥J∥²=Tr(JJ^(H)) is used. One may nowsearch for a separation filter by minimizing J(W) with a gradientalgorithm. Specifically, a stochastic gradient approach is followedwhere the minimization function takes a gradient step$\frac{\partial{J\left( {t,W} \right)}}{\partial{W(l)}}$for every t (as opposed to summing the total gradient before updating).[Note that the optimal estimates of the autocorrelation Λ_(s)(t, τ) fora given W are the diagonal elements of E[ŝ(t)ŝ^(H)(t−τ)]. Therefore onlythe gradient with respect to W is needed.] That stochastic gradientapproach leads to the following expression,$\frac{\partial{J\left( {t,W} \right)}}{\partial{W(l)}} = {{\sum\limits_{\tau}{\left( {{\sum\limits_{\tau^{\prime}}{{\hat{s}\left( {t + \tau^{\prime}} \right)}{{\hat{s}}^{H}\left( {t + \tau^{\prime} - \tau} \right)}}} - {{\hat{\Lambda}}_{s}\left( {t,\tau} \right)}} \right) \times {\sum\limits_{\tau^{''}}{{\hat{s}\left( {t + \tau^{''} - \tau} \right)}{{\hat{x}}^{H}\left( {t + \tau^{''} - l} \right)}}}}} + {\sum\limits_{\tau}{\left( {{\sum\limits_{\tau^{\prime}}{{\hat{s}\left( {t + \tau^{\prime} - \tau} \right)}{{\hat{s}}^{H}\left( {t + \tau^{\prime}} \right)}}} - {{\hat{\Lambda}}_{s}\left( {t,\tau} \right)}} \right) \times {\sum\limits_{\tau^{''}}{{\hat{s}\left( {t + \tau^{''}} \right)}{{\hat{x}}^{H}\left( {t + \tau^{''} - \tau - l} \right)}}}}}}$

To simplify this expression, it can be shown that the first and secondsums over r can be made equal. In the gradient descent procedure one maychoose to apply the different gradient terms in these sums at timesother than the time t. Following that argument, t can be replaced witht−τ in the second sum, which effectively uses the value of the sum inthe gradient update at time t=t−1. In addition, with the assumption ofsymmetry for the τ sums over positive and negative values, the sign of τcan be changed in the second sum. It is also reasonable to suggest thatthe diagonal matrix Λ_(s)(t, τ) remains unchanged by thesetransformations, at least in a quasi-stationary approximation. Theresulting filter parameter update at time t for lag 1 with a step sizeof) simplifies to $\begin{matrix}{{\Delta_{t}{W(l)}} = {{- 2}\mu{\sum\limits_{\tau}{\left( {{\sum\limits_{\tau^{\prime}}{{\hat{s}\left( {t + \tau^{\prime}} \right)}{{\hat{s}}^{H}\left( {t + \tau^{\prime} - \tau} \right)}}} - {{\hat{\Lambda}}_{s}\left( {t,\tau} \right)}} \right) \times {\sum\limits_{\tau^{''}}{{\hat{s}\left( {t + \tau^{''} - \tau} \right)}{{\hat{x}}^{H}\left( {t + \tau^{''} - l} \right)}}}}}}} & (17)\end{matrix}$

The sums over T′ and T″ represent the averaging operations while the sumover T stems from the correlation in Equation 13. The estimatedcross-correlation of the sensor signals may be denoted as{circumflex over (R)}_(x)(t, τ)=E[{circumflex over (x)}(t){circumflexover (x)}^(H)(t−τ)].  (18)By inserting Equation 2 into Equation 17, and using the relationship ofEquation 18, an expression is obtained for the filter parameter updateat time t, asΔ_(t)W=−μJ(t)*W+{circumflex over (R)}_(x)(t)  (19)with J(t)=W*{circumflex over (R)}x(t)⋆W^(H)−Λ_(s)(t)In the short hand notation of this update expression, convolutions arerepresented by *, and correlations by ⋆, and time lag indices areomitted. In the derivation of this update expression, it is assumed thatthe estimated cross-correlations don't change much within the time scalecorresponding to one filter length, i.e. {circumflex over (R)}_(x)(t,τ)≈R_(x)(t+1, τ) for 0<1≦Q.

-   -   (2) Frequency Domain Conversion

In the signal processing arts, it is known to operate in the frequencydomain, rather than the time domain, in order to enhance the efficacy ofa computational algorithm. Accordingly, a set of time-domain signal datamay be converted to the frequency domain for processing of that data.This motivation is particularly applicable for BSS algorithms becausethe filters typically used for separation of the signal data areessentially addressed to convolutions and such convolutions are muchmore effectively handled in the frequency domain than in the timedomain.

To that end, an on-line frequency domain implementation of the basictime-domain gradient algorithm is described below. In the development ofthat frequency domain implementation, an effort is made to reduce thecomputational cost as well as improve the convergence properties of thegradient updates.

Because the convolutions in Equations 19 are expensive to compute, thisgradient expression is transformed into the frequency domain with Tfrequency bins. If one assumes small filters compared to the number offrequency components, i.e. Q<<T, the convolutions factor approximately.Thus to a good approximation the stochastic gradient update rule ofEquation 19 is transformed to the frequency domain as:Δ_(t)W(ω)=2μJ(t, ω)W(ω){circumflex over (R)}_(x)(t, ω)  (20)with J(t, ω)=W(ω){circumflex over (R)}_(x)(t, ω)W^(H)(ω)−{circumflexover (λ)}_(s)(t, ω)Here W(ω), representing the filter parameters for the signal frequencycomponent under consideration, and {circumflex over (R)}_(x)(t, ω),representing the running estimate of cross correlation, are,respectively, the T-point discrete Fourier transform of W(τ) andR_(x)(t, τ); t represents the time at which a current estimate is made.

It is noted that the same expression can be obtained directly in thefrequency domain by using the gradient of Σ_(t, ω)J(t, W)=Σ_(t, ω)∥(t,ω)∥² with respect to W*. In a gradient rule with complex variables thisrepresents the combination of the partial derivatives with respect tothe real and complex parts of W. In this formalism the update rule for acomplex parameter ω with learning rate p is${\Delta\omega} = {2\mu{\frac{\partial}{\partial\omega^{*}}.}}$

-   -   (3) Adaptive Power Normalization

In a further refinement of this embodiment of the invention, it ispreferred that some second order gradient expressions be considered, arefinement which is expected to improve the convergence properties ofthe decorrelation algorithm of the invention. A proper Newton-Raphsonupdate requires the inverse of the Hessian matrix. Computing the exactinverse Hessian seems to be quite difficult in this case. A commonapproach is to neglect the off-diagonal terms of the Hessian. Thisshould give efficient gradient updates in the case that the coefficientsare not strongly coupled. If the constraints on the filter size in thetime domain are ignored, the approximate frequency domain gradientupdates depend only on W(ω), as can be seen in Equation 20. Theparameters are therefore decoupled for different frequencies. Howeverthe several elements of W(ω) at a single frequency may be stronglydependent, in which case a diagonal approximation of the Hessian wouldbe quite poor. Accordingly, it appears unwise to modify the gradientdirections of the matrix elements W(ω) for a given frequency. Instead,for the preferred embodiment, the original gradient is followed, but thestep size should be adapted with a normalization factor h(t, ω) fordifferent frequencies. Thus the update expression becomes:$\begin{matrix}{{\Delta_{t}{W(\omega)}} = {{- \mu}\quad{h^{- 1}\left( {t,\omega} \right)}\frac{\partial{J\left( {t,\omega} \right)}}{\partial{W^{*}(\omega)}}}} & (21)\end{matrix}$where μ is a fixed learning constant, and h is a weighting factor forthe step-size adaptation

An appropriate step size normalization factor can be determined asfollows. For a real valued square cost, J(z)=azz* in the complex plane,the proper second order gradient step corresponds to(∂²J(z)/∂z∂z*)⁻¹∂J(z)/∂z*=z. The corresponding expression in the currentcost function can be computed to a$\frac{\partial^{2}J}{{\partial^{2}w_{ij}^{*}}{\partial^{2}w_{ij}}} = {2{\left( {\left( {WR}_{x} \right)^{H}\left( {WR}_{x} \right)} \right)_{jj}.}}$It is real valued and independent of i. It is preferred that the samestep size be used for all j, and therefore a sum over j is used. Thestep size normalization factor becomes: $\begin{matrix}{{h\left( {t,\omega} \right)} = {{\sum\limits_{j}\quad\frac{\partial^{2}{J\left( {t,\omega} \right)}}{{\partial{W_{ij}^{*}(\omega)}}{\partial{W_{ij}(\omega)}}}} = {{{W(\omega)}{{\hat{R}}_{x}\left( {t,\omega} \right)}}}^{2}}} & (22)\end{matrix}$This is effectively an adaptive power normalization. The inventors haveempirically determined that, with the use of such a step-size adaptationfor the update algorithm, the resulting updates were stable and lead tofaster convergence.

-   -   (4) Elimination of Unwanted Filter Coefficients

The filter parameter values, W, may include unwanted coefficients thatactually reduce the efficacy of the filter in separating out the desiredsignal. Thus, in a preferred embodiment, at each update iteration, thoseunwanted coefficients are eliminated by a temporary conversion of thefrequency domain (W(ω)) values (using inverse DFT) back to the timedomain. There, the filter solutions are restricted to those filters thathave no time response beyond τ>Q Ca T. Effectively, the routineparameterizes Td_(s)d_(x) filter coefficients in W(o) with Qd_(s)d_(x)parameters W(τ). In practice, the values of W are produced in thefrequency domain and an inverse DFT is performed on these frequencydomain values to convert the values to the time domain. In the timedomain, any W value that appears at a time greater than a time Q is setto zero and all values in the range below Q are not adjusted. Theadjusted time domain values of are then converted using DFT back to thefrequency domain. By zeroing the filter response in the time domain forall time greater than Q, the frequency response of the filter issmoothed such that a unique solution at each frequency is readilydetermined.

-   -   (5) Operation Of The Decorrelation Algorithm

As already described in connection with the flow chart of FIG. 2, thedecorrelation algorithm of the invention is implemented as a blockprocessing procedure. The signals are windowed and transformed into thefrequency domain, i.e. the segment x₁(t), . . . , x_(i), (t+T−1) givesfrequency components x_(i)(t, ω) for ω=0, . . . , T−1. These are used tocompute the estimated cross-correlations directly in the frequencydomain, i.e. {circumflex over (R)}_(x)(t, ω)=E[x(t, ω)x^(H)(t, ω)]. Inan on-line algorithm the expectation operation is typically implementedas an exponentially windowed average of the past. For thecross-correlations of the observations in the frequency domain thisreads{circumflex over (R)}_(x)(t, ω)=(1−γ){circumflex over (R)}_(x)(t,ω)+γx(t, ω)x^(H)(t, ω)x^(H)(t, ω)  (23)where γ represents a forgetting factor to be chosen according to thestationarity time of the signal. Also at time t, a gradient step, asshown in Equation 21, is performed with the current estimate of{circumflex over (R)}_(x)(t, ω). As the updates are computed in thefrequency domain, it is quite natural to implement the actual filteringwith an overlap-save routine. For this, one takes signal windows of size2T and steps by T samples to obtain the next block of data. These blocksof data can be used for both the gradient and estimation updates.However, it may be necessary to perform more gradient steps beforevisiting new data. That is, one may want to update more frequentlywithin that time T. In general, for a given frame rate, r, one wouldcompute the 2T frequency components and update W(ω) and {circumflex over(R)}_(x)(t, ω) at t=T/r, 2T/r, 3T/r, . . . , Conventional overlap andsave corresponds to r=1.III. System Embodiment

FIG. 5 depicts a system for implementing the source separation method ofeither the off-line or on-line embodiments of the BSS methodology of theinvention. The system comprises a convolved signal source 526 thatsupplies the signal that is to be separated into its component signalsand a computer system 508 that executes the multiple decorrelationroutine 524 of the present invention. The source 526 may contain anysource of convolved signals, but is illustratively shown to contain asensor array 502 and a signal processor 504. The sensor array containsone or more transducers 502A, 502B, 502C such as microphones. Thetransducers are coupled to a signal processor 504 that performs signaldigitization. A digital signal is coupled to the computer system 508 forsignal separation and further processing.

The computer system 508 comprises a central processing unit (CPU) 514, amemory 522, support circuits 516, and an input/output (I/O) interface520. The computer system 508 is generally coupled through the I/Ointerface 520 to a display 512 and various input devices 510, such as amouse and keyboard. The support circuits generally contain well-knowncircuits such as cache, power supplies, clock circuits, a communicationsbus, and the like. The memory 522 may include random access memory(RAM), read only memory (ROM), disk drive, tape drive, and the like, orsome combination of memory devices. The invention is implemented as themultiple decorrelation routine 524 that is stored in memory 522 andexecuted by the CPU 514 to process the signal from the signal source526. As such, the computer system 508 is a general purpose computersystem that becomes a specific purpose computer system when executingthe routine 524 of the present invention. Although a general purposecomputer system is illustratively shown as a platform for implementingthe invention, those skilled in the art will realize that the inventioncan also be implemented in hardware as an application specificintegrated circuit (ASIC), a digital signal processing (DSP) integratedcircuit, or other hardware device or devices. As such, the invention maybe implemented in software, hardware, or a combination of software andhardware.

The illustrative computer system 508 may also contain a speechrecognition processor 518, e.g., a speech recognition circuit card or aspeech recognition software, that is used to process the componentsignals that the invention extracts from the convolutive signal. Assuch, a conference room having a plurality of people speaking andbackground noise can be monitored with multiple microphones 502. Themicrophones 502 produce a composite speech signal that requiresseparation into component signals if a speech recognition system is tobe used to convert each person's speech into computer text or intocomputer commands. The composite speech signal is filtered, amplifiedand digitized by the signal processor 504 and coupled to the computersystem 508. The CPU 514, executing the multiple decorrelation routine524, separates the composite signal into its constituent signalcomponents. From these constituent components, background noise caneasily be removed. The constituent components without noise would thenbe coupled to the speech recognition processor 518 to process thecomponent signals into computer text or computer commands. In thismanner, the computer system 508 while executing the multipledecorrelation routine 524 is performing signal pre-processing orconditioning for the speech recognition processor 518.

Although various embodiments which incorporate the teachings of thepresent invention have been shown and described in detail herein, thoseskilled in the art can readily devise many other varied embodiments thatstill incorporate these teachings.

Accordingly, it is intended that all such alternatives, modifications,permutations, and variations to the exemplary embodiments can be madewithout departing from the scope and spirit of the present invention.

1. A method for separating a plurality of mixed signals into a pluralityof component signals comprising the steps of: (a) producing a pluralityof discrete Fourier transform (DFT) values corresponding to frequencycomponents of an input segment of said mixed signals; (b) updating crosscorrelation estimation matrices using said DFT values; (c) computing,using a cost-function minimization process, an update value for aplurality of filter coefficients for a finite impulse response (FIR)filter using said cross correlation estimation values; (d) filteringsaid mixed signals using said FIR filter having said updated filtercoefficients to separate said mixed signals into one or more componentsignals; and (e) iteratively repeating steps (a), (b), (c) and (d) forsuccessive input segments of said mixed signal.
 2. The method of claim 1wherein step (c) further comprises the substeps of: (c1) transformingsaid update filter coefficients from the frequency domain into the timedomain; (c2) zeroing any filter coefficients having a value other thanzero for any time that is greater than a predefined time Q, where Q isless than a value of an input-segment length, T; thereby producing a setof constrained time domain filter coefficients; and (c3) transformingsaid adjusted time domain filter coefficients from the time domain backinto the frequency domain.
 3. The method of claim 1 wherein saidcost-function minimization process is a gradient descent process.
 4. Themethod of claim 1 wherein said computation of said filter coefficientupdate values further includes an adaptation of update step sizes basedon a normalization factor.
 5. The method of claim 4 wherein said updatevalues, including said update step-size adaptation, are computedaccording toΔ_(t)W(ω)=−μ/h ∇_(w)E where μ is a fixed learning constant, h is aweighting factor for the step-size adaptation, and E is afilter-parameter cost function operating on a square differencerespecting a diagonal covariance of said component signals.
 6. Themethod of claim 1 wherein a running average of said cross correlationestimation values are produced according to{circumflex over (R)}_(x)(t, ω)=(1−γ){circumflex over (R)}_(x)(t,ω)+γx(t, ω)x^(H)(t, ω) where x(t, ω) is the mixed signal in thefrequency domain.
 7. An apparatus for separating a plurality of mixedsignals into a plurality of component signals comprising: means forproducing a plurality of discrete Fourier transform (DFT) valuescorresponding to frequency components of an input segment of said mixedsignals; means for updating cross correlation estimation matrices usingsaid DFT values; a cost-function minimization processor for computing anupdate value for a plurality of filter coefficients for a finite impulseresponse (FIR) filter using said cross correlation estimation values;and an FIR filter having said updated filter coefficients to separatesaid mixed signals into one or more component signals.
 8. The apparatusof claim 7 wherein said cost-function minimization processor furthercomprises: a first transformer for transforming said update filtercoefficients from the frequency domain into the time domain; means forzeroing any filter coefficients having a value other than zero for anytime that is greater than a predefined time Q, where Q is less than avalue of an input-segment length; T; thereby producing a set ofconstrained time domain filter coefficients; and a second transformerfor transforming said adjusted time domain filter coefficients from thetime domain back into the frequency domain.
 9. The apparatus of claim 8wherein said first transformer uses an inverse Fourier transform andsaid second transformer uses a Fourier transform.
 10. The apparatus ofclaim 7 wherein said cost-function minimization processor carries out agradient descent process.
 11. The apparatus of claim 7 wherein saidcomputation of said filter coefficient update values further includes anadaptation of update step sizes based on a normalization factor.
 12. Theapparatus of claim 11 wherein said update values, including said updatestep-size adaptation, are computed according toΔ_(t)W(ω)=−μ/h ∇_(w)E where μ is a fixed learning constant, h is aweighting factor for the step-size adaptation, and E is afilter-parameter cost function operating on a square differencerespecting a diagonal covariance of said component signals.
 13. Theapparatus of claim 7 wherein a running average of said cross correlationestimation values are produced according to{circumflex over (R)}_(x)(t, ω)=(1−γ){circumflex over (R)}_(x)(t,ω)+γx(t, ω)x^(H)(t, ω) where x(t, ω) is the mixed signal in thefrequency domain.
 14. The apparatus of claim 7 further comprising avoice recognition system for processing at least one of said componentsignals.
 15. A computer readable storage medium containing a programthat, when executed upon a general purpose computer system, causes saidgeneral purpose computer system to become a specific purpose computersystem that performs a method for separating a plurality of mixedsignals into a plurality of component signals comprising the steps of:(a) producing a plurality of discrete Fourier transform (DFT) valuescorresponding to frequency components of an input segment of said mixedsignals; (b) updating cross correlation estimation matrices using saidDFT values; (c) computing, using a cost-function minimization process,an update value for a plurality of filter coefficients for a finiteimpulse response (FIR) filter using said cross correlation estimationvalues; (d) filtering said mixed signals using said FIR filter havingsaid updated filter coefficients to separate said mixed signals into oneor more component signals; and (e) iteratively repeating steps (a), (b),(c) and (d) for successive input segments of said mixed signal.
 16. Thecomputer readable medium of claim 15 wherein step (c) further comprisesthe substeps of: (c1) transforming said update filter coefficients fromthe frequency domain into the time domain; (c2) zeroing any filtercoefficients having a value other than zero for any time that is greaterthan a predefined time Q, where Q is less than a value of aninput-segment length, T; thereby producing a set of constrained timedomain filter coefficients; and (c3) transforming said adjusted timedomain filter coefficients from the time domain back into the frequencydomain.
 17. The computer readable medium of claim 15 wherein saidcost-function minimization process is a gradient descent process. 18.The computer readable medium of claim 15 wherein said computation ofsaid filter coefficient update values further includes an adaptation ofupdate step sizes based on a normalization factor.
 19. The computerreadable medium of claim 18 wherein said update values, including saidupdate step-size adaptation, are computed according toΔ₁W(ω)=−μ/h ∇_(w)E where μ is a fixed constant, h is a weighting factorfor the step-size adaptation, E is a filter-parameter cost functionoperating on a square difference respecting a diagonal covariance ofsaid component signals, and E is a gradient step for E.
 20. The computerreadable medium of claim 15 wherein a running average of said crosscorrelation estimation values are produced according to{circumflex over (R)}_(x)(t, ω)=(1−γ){circumflex over (R)}_(x)(t,ω)+γx(t, ω)x^(H)(t, ω) wherein x(t, ω) is in the mixed signal in thefrequency domain.